Character based String Kernels for Bio-Entity Relation Detection
نویسندگان
چکیده
Extracting bio-entity relations has emerged as an important task due to the ever-growing number of bio-medical documents. In this paper, we present a simple and novel representation for extracting bio-entity relationships. The state-of-theart systems for such tasks rely on word based representations and variations of linguistic driven features. In contrast, we model bio-text by the most basic character based string representation with a family of string kernels. This eliminates time consuming parsing, issue of rare words and domain specific pre-processing. This simple representation makes our approach fast and flexible for any bio-NLP dataset. We demonstrate comparable performance and faster computation time of our approach versus previous state-of-the-art kernel methods.
منابع مشابه
Semi-supervised Abstraction-Augmented String Kernel for Multi-level Bio-Relation Extraction
Bio-relation extraction (bRE), an important goal in bio-text mining, involves subtasks identifying relationships between bio-entities in text at multiple levels, e.g., at the article, sentence or relation level. A key limitation of current bRE systems is that they are restricted by the availability of annotated corpora. In this work we introduce a semisupervised approach that can tackle multi-l...
متن کاملLearning state machine-based string edit kernels
During the past few years, several works have been done to derive string kernels from probability distributions. For instance, the Fisher kernel uses a generative model M (e.g. a hidden markov model) and compares two strings according to how they are generated by M . On the other hand, the marginalized kernels allow the computation of the joint similarity between two instances by summing condit...
متن کاملFast Kernels for Inexact String Matching
We introduce several new families of string kernels designed in particular for use with support vector machines (SVMs) for classification of protein sequence data. These kernels – restricted gappy kernels, substitution kernels, and wildcard kernels – are based on feature spaces indexed by k-length subsequences from the string alphabet Σ (or the alphabet augmented by a wildcard character), and h...
متن کاملStudying Translationese at the Character Level
This paper presents a set of preliminary experiments which show that identifying translationese is possible with machine learning methods that work at character level, more precisely methods that use string kernels. But caution is necessary because string kernels very easily can introduce confounding factors.
متن کاملFinite p-groups with few non-linear irreducible character kernels
Abstract. In this paper, we classify all of the finite p-groups with at most three non linear irreducible character kernels.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016